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US-CL-CURRENT: 7 07/4 
ABSTRACT: 

Systems and methods are herein disclosed for assessing the staleness of a web 
page. In particular, in one method of the present invention, the staleness of 
a web page is assessed by examining internal date references within the web 
page. In another method of the present invention, the staleness of a web page 
is assessed by examining the meta-data associated with the web page. In a 
further method of the present invention, the staleness of a hyperlinked web 
page is determined by examining the link status of the hyperlinks. If the web 
page has a relatively large number of dead links, it is assessed as being a 
stale web page. In a still further method of the present invention, the link 
status of web pages in the neighborhood of the web page being assessed is 
likewise examined. 



Description of Disclosure - DETX (27): 

[0045] A dead web page is a page that is not publicly available over the 
web. A page can be dead for any of the following reasons: (1) its URL is 
malformed; (2) its host is down or non-existent; or (3) it does not exist on 
the host. The first two types of dead pages are easy to detect: the former 
fails URL parsing and the latter fails the resolution of the host address. 
When fetching pages that are not found on a host, the web server of the host is 
supposed to return an error ; typically the error message returned is the 404 
HTTP return code. However, it turns out that many web servers today do not 
return an error code even when they receive HTTP requests for non-existent 
pages. Instead, they return an OK code (200) and some substitute page; 
typically, this substitute is an error message page or the home-page of that 
host or even some completely unrelated page. Such non-existent pages that 
cause a server to issue the foregoing result are called "soft-404 pages". 
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